And so begins my first ever analysis. Using the skills I’ve learnt
throughout the course I aim to showcase them through this project,
following the data analysis process of : Ask, Prepare, Process,
Analyse, Share, and Act.
Initially when beginning the project I had the aim to use the tools I
learnt during this course which would be:
- Use Google sheets to do initial cleaning.
- Use SQL - BigQuery for sorting & analysis.
- Finish off on R programming/Tableau for further analysis,
visualisation and reporting.
Unfortunately due to the size of
the data sets I was working on, Google sheets struggled & so did
excel when I attempted there. I knew I would be limited and would have
to improvise with BigQuery, but as I tried to upload a single uncleaned
file I got over 200 errors… Thus I ended up using Microsoft’s Power
Query to do the cleaning phase required for this project. As it was
something I was planning of learning next, using power query was the
saving grace that I will detail below.
Time is hard to come by, I’ve
included a summary of this project on my github in the
README apposed to this full review.
Three people who I’d like to
mention are:
Equitable
Equations: whose video’s I started to watch alongside this course’s
module on R programming. His content has helped me alot during this
project, as I was supplementing my learning from the course with his
content.
Kevin
Stratvert: apart from Microsofts own power query tutorials I
referred to his video’s in learning power query. Additionally throughout
the course when learning about spreadsheets, I watched his video’s to
see the microsoft way of doing the steps shown on google sheets.
Alex The Analyst:
With little knowledge on uploading of projects online I looked at his
video’s on how to build the website via github, he has project
walkthrough’s which I might attempt at a later point, especially the SQL
ones.
I’ve been hired as a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director believes the future success of the company depends on the growth of annual membership. So looking at the data from the last 12 months I hope to understand how the companies casual and annual members differ. From there I am to make 3 suggesting with my findings for new marketing strategies that could convert current casual membership to annual membership.
In 2016, Cyclistic launched a successful bike-share offering. Since
then, the program has grown to a fleet of 5,824 bicycles that are
geotracked and locked into a network of 692 stations across Chicago. The
bikes can be unlocked from one station and returned to any other station
in the system anytime. Until now, Cyclistic’s marketing strategy relied
on building general awareness and appealing to broad consumer segments.
One approach that helped make these things possible was the flexibility
of its pricing plans: single-ride passes, full-day passes, and annual
memberships. Customers who purchase single-ride or full-day passes are
referred to as casual riders. Customers who purchase annual memberships
are Cyclistic members.
The company has provided historical trip data in the form of csv
files that are divided by month to analyse.
As Cyclistic is a
fictional company, I’ll be using data provided by Motivate International
Inc under this licence.
A
concept that was taught during the course was “ROCCC”:
Reliable, Original, Comprehensive, Current,
Citied.
Looking at the individual months of data
provided: I have 13 columns:
ride_id, rideable_type,
started_at, ended_at, start_station_name, start_station_id,
end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng
& member_casual
As I eluded to before, when working with these files on Google sheets
I kept running into numerous issues. The documents kept crashing,
attempting to create new columns with filtered data or calculations kept
bringing up error’s & very slow updates even with 32GB ram.
I
spent a glorious 2 days trying to make it work, but with these datasets
having over 600,000 rows seemed too much for both Google sheets and
excel.
That’s when I considered doing the filtering on Google
BigQuery, I would be limited as the free version does not support
deleting but it would just mean some of my filtering would require
additional lines of code to exclude values not required. As I go to
upload the first file, I get 203 individual errors with the csv I try to
upload. Not ideal!
After abit of research online, alot of other people attempting this
capstone pointed out they were running into similar issues. Noting that
Power Query is helpful work around.
After learning the platform and
with some trial and error I was able to clean the data on Power Query
following these steps:
Date.From([–]) to
extract dates from started_at and
ended_at.if [column1] = [column2] then “match” else “no
match”Time.From([–]) to
extract times from started_at and
ended_at.[end time] - [start time] columns.Date.ToText([], "dddd") to get the day of the
week.Date.ToText([], "YY-mm") to extract year and
month.Time.ToText([], "h tt") to get the hour with
AM/PM.start_lat and end_lat between
41.6445 and 42.0230.start_lng and end_lng between
-87.9401 and -87.5237.
After completing the filtering process, I transferred the newly
filtered data to individual sheets that were organized by month for a
quick review. The original sheets named in the format
202309-divvy-tripdata, were saved separately with new file
names corresponding to the original titles, such as
September-2023. I created a couple of pivot tables for selected
months so I can get some initial interpretations on what the results
look like.
Using the pivot tables I was
able to mock up some visualisations on Excel that give me some initial
understanding on the what the data story would look like. It was
actually at this point I had some questions on the data, that would lead
to my assumption on the Comprehensive flaw. I’ve left a
clue in the two examples above.
Orignally the plan was to use my newly filtered data to upload to R programming for analysis, but due to my concern identified in the Process phase I decided to re do it all on R. This would give me the chance to work with R, ensure my work is bulletproof and have some fun with it.
In the code box above is the entirety of my script that I used
within R, which showcases examples of me using commands from these
packages: tidyverse, geosphere & scales.
I created 2 tables. One for high duration trips of over 4hours &
the other for duration for trips under 3 minutes of length.
Here are the results
| member_casual | day | time | distance | duration |
|---|---|---|---|---|
| casual | Sunday | 4PM | 0.0000000 | 267.4667 |
| member | Friday | 5PM | 2.2859228 | 325.0667 |
| casual | Friday | 11AM | 6.1936625 | 528.8833 |
| casual | Friday | 6PM | 1.3598795 | 249.5333 |
| casual | Sunday | 12PM | 1.6225216 | 269.9833 |
| casual | Friday | 11AM | 1.6301813 | 515.9167 |
| member | Thursday | 10AM | 0.5828608 | 255.6833 |
| casual | Friday | 5PM | 2.4017389 | 285.4000 |
| casual | Thursday | 4AM | 4.0835123 | 490.6333 |
| casual | Saturday | 4PM | 0.9094301 | 268.3667 |
| member | Wednesday | 10AM | 0.0000000 | 280.5000 |
| member | Friday | 9AM | 1.1230843 | 306.9000 |
| member | Friday | 6AM | 3.9930711 | 435.3833 |
| casual | Friday | 5PM | 2.4017389 | 283.6167 |
| casual | Thursday | 8AM | 0.0000000 | 267.4833 |
| casual | Friday | 6AM | 0.0000000 | 249.1667 |
| casual | Saturday | 2AM | 0.0000000 | 676.9833 |
| casual | Saturday | 5PM | 1.5309559 | 255.7833 |
| member | Wednesday | 5AM | 1.0702981 | 327.8167 |
| member | Thursday | 4PM | 0.4067669 | 414.1500 |
| casual | Saturday | 5PM | 2.1320309 | 250.2000 |
| member | Thursday | 6PM | 2.4910387 | 250.9833 |
| member | Saturday | 9AM | 12.2693843 | 406.0833 |
| member | Sunday | 8AM | 6.9500147 | 395.9167 |
| casual | Sunday | 8AM | 0.0000000 | 295.2500 |
| casual | Friday | 11AM | 0.4441515 | 272.8167 |
| casual | Sunday | 12PM | 3.1591959 | 291.0667 |
| casual | Saturday | 11AM | 15.7690985 | 368.6333 |
| member | Saturday | 9AM | 1.4360204 | 479.9167 |
| member | Thursday | 3AM | 0.3197353 | 309.4667 |
| casual | Saturday | 11AM | 0.1516442 | 273.8333 |
| casual | Tuesday | 2AM | 0.0000000 | 369.9833 |
| casual | Monday | 4PM | 0.0000000 | 245.1833 |
| casual | Friday | 9AM | 2.0650160 | 257.8833 |
| casual | Saturday | 1PM | 2.8782314 | 241.1333 |
| casual | Saturday | 6AM | 0.0000000 | 242.1000 |
| casual | Saturday | 11AM | 0.6841452 | 293.7000 |
| casual | Saturday | 2AM | 1.1627881 | 607.4167 |
| casual | Friday | 4PM | 0.9984893 | 318.9000 |
| casual | Sunday | 8AM | 8.2703863 | 240.4333 |
| member | Tuesday | 10AM | 0.7533263 | 292.5000 |
| casual | Wednesday | 8AM | 4.7516953 | 421.6167 |
| member | Saturday | 1PM | 0.6143663 | 453.4333 |
| casual | Saturday | 12AM | 2.3088144 | 776.4333 |
| casual | Tuesday | 12PM | 0.0000000 | 427.8333 |
| casual | Sunday | 7AM | 1.0247790 | 686.4833 |
| casual | Saturday | 2PM | 0.0000000 | 272.6333 |
| member | Wednesday | 6AM | 0.7420484 | 590.9333 |
| casual | Sunday | 1PM | 1.6310436 | 273.3500 |
| casual | Saturday | 7AM | 0.0000000 | 270.5333 |
| casual | Saturday | 11AM | 0.0000000 | 280.6000 |
| casual | Saturday | 7AM | 0.0000000 | 302.6667 |
| casual | Saturday | 7AM | 2.1935750 | 539.3167 |
| member | Sunday | 9AM | 0.1527771 | 348.9500 |
| casual | Saturday | 11AM | 6.2720064 | 378.7500 |
| casual | Thursday | 6PM | 1.2070448 | 249.5667 |
| casual | Friday | 8AM | 3.7433061 | 318.9000 |
| casual | Saturday | 1AM | 2.9389548 | 589.4167 |
| casual | Sunday | 2PM | 0.8852420 | 245.2500 |
| casual | Sunday | 2PM | 0.8852420 | 256.7833 |
| casual | Saturday | 2AM | 0.8975036 | 996.4667 |
| member | Thursday | 5AM | 7.8023342 | 256.4667 |
| casual | Saturday | 11AM | 3.7753634 | 476.0167 |
| member | Friday | 9AM | 3.7736168 | 299.3167 |
| member | Tuesday | 8AM | 12.9812159 | 341.4833 |
| casual | Monday | 9AM | 2.4596479 | 625.9500 |
| casual | Tuesday | 12AM | 0.0000000 | 570.8000 |
| casual | Saturday | 11AM | 0.6841452 | 294.8333 |
| casual | Thursday | 4PM | 1.4156344 | 261.9000 |
| member | Sunday | 6PM | 4.9957001 | 271.2833 |
| casual | Sunday | 9AM | 1.2094589 | 357.9167 |
| casual | Thursday | 11AM | 2.9026262 | 499.5833 |
| member | Tuesday | 4AM | 1.0912617 | 826.5667 |
| casual | Monday | 11AM | 0.7881408 | 272.3000 |
| member | Friday | 10AM | 1.5188059 | 446.4000 |
| casual | Friday | 6AM | 0.0000000 | 373.4500 |
| member | Friday | 3PM | 9.2963330 | 335.6667 |
| casual | Friday | 9AM | 1.2888765 | 418.9833 |
| casual | Saturday | 12AM | 0.4151005 | 1231.7833 |
| member | Thursday | 1PM | 0.0000000 | 307.6333 |
| casual | Friday | 1AM | 0.5521328 | 303.2667 |
| member | Thursday | 11AM | 2.6761359 | 341.8667 |
| casual | Saturday | 3PM | 1.5046137 | 308.8667 |
| member | Monday | 8AM | 1.8319554 | 423.6167 |
| member | Sunday | 1PM | 1.3937289 | 291.4667 |
| casual | Saturday | 6PM | 0.9404399 | 246.1500 |
| member | Saturday | 12PM | 6.7687302 | 259.8000 |
| casual | Monday | 10AM | 0.0000000 | 349.3500 |
| casual | Tuesday | 8AM | 1.2505911 | 310.7667 |
| casual | Monday | 11AM | 2.3288199 | 246.3333 |
| casual | Saturday | 5PM | 2.3532147 | 312.5000 |
| member | Monday | 8AM | 1.1105557 | 271.7500 |
| casual | Sunday | 2PM | 1.9878884 | 377.3000 |
| casual | Friday | 6PM | 5.2870733 | 240.2500 |
| casual | Friday | 4PM | 2.0296840 | 403.0333 |
| casual | Saturday | 11AM | 0.0000000 | 347.9333 |
| casual | Friday | 8AM | 1.0504544 | 260.6833 |
| casual | Friday | 8AM | 1.0504544 | 264.8667 |
| member | Thursday | 11AM | 0.4104641 | 287.0833 |
| casual | Friday | 1AM | 0.0000000 | 733.8667 |
| casual | Wednesday | 4PM | 0.0000000 | 242.7667 |
| casual | Friday | 11AM | 0.0000000 | 258.8167 |
| casual | Thursday | 10AM | 1.4269009 | 423.0667 |
| casual | Saturday | 11AM | 1.7104408 | 461.7667 |
| casual | Sunday | 11AM | 1.0161558 | 473.0667 |
| casual | Wednesday | 10AM | 4.1152733 | 488.3500 |
| casual | Friday | 12AM | 0.3369670 | 972.0667 |
| casual | Friday | 12PM | 0.0000000 | 396.0667 |
| casual | Friday | 12PM | 0.0000000 | 396.4000 |
| casual | Thursday | 2PM | 0.6085115 | 243.3833 |
| member | Saturday | 10AM | 0.0000000 | 257.2333 |
| casual | Thursday | 1PM | 1.6986505 | 298.7833 |
| casual | Wednesday | 9AM | 6.2952961 | 527.6333 |
| casual | Thursday | 8AM | 2.2356399 | 443.4667 |
| casual | Monday | 1PM | 1.4657035 | 349.0000 |
| casual | Friday | 12AM | 1.2540985 | 867.5167 |
| casual | Saturday | 2PM | 0.0000000 | 297.5500 |
| member | Monday | 6AM | 1.6454806 | 286.4833 |
| casual | Saturday | 1AM | 2.4898893 | 693.0500 |
| casual | Saturday | 7AM | 0.3323922 | 301.1667 |
| casual | Sunday | 1AM | 0.0000000 | 571.3000 |
| member | Saturday | 12PM | 1.6009861 | 611.8167 |
| casual | Monday | 10AM | 0.5220774 | 484.7000 |
| casual | Saturday | 1PM | 8.8259985 | 345.7333 |
| member | Thursday | 8AM | 12.8782159 | 340.2500 |
| member | Tuesday | 8AM | 0.5638712 | 426.0500 |
| member | Friday | 4PM | 1.5048510 | 346.1000 |
| casual | Sunday | 12PM | 2.4477388 | 439.3167 |
| casual | Friday | 9AM | 0.0000000 | 241.1500 |
| casual | Wednesday | 7AM | 0.0000000 | 496.1500 |
| casual | Friday | 8AM | 0.3138223 | 244.4167 |
| casual | Thursday | 9AM | 2.3694810 | 263.1667 |
| casual | Sunday | 12PM | 0.7449020 | 420.4000 |
| casual | Sunday | 5AM | 0.9006750 | 710.4667 |
| casual | Friday | 12PM | 1.5783301 | 389.8500 |
| casual | Sunday | 1PM | 1.6206909 | 380.1667 |
| casual | Wednesday | 10AM | 2.2334020 | 498.7167 |
| member | Sunday | 8AM | 0.0000000 | 622.5833 |
| casual | Saturday | 10AM | 1.6942840 | 290.2167 |
| casual | Friday | 8AM | 0.7746417 | 324.1167 |
| casual | Saturday | 7AM | 1.7526494 | 254.6167 |
| casual | Saturday | 7AM | 1.7526494 | 254.3333 |
| casual | Tuesday | 3PM | 3.3544152 | 296.8833 |
| casual | Saturday | 12AM | 0.0000000 | 345.5167 |
| casual | Sunday | 2PM | 5.2877843 | 240.1833 |
| casual | Sunday | 2PM | 8.9969430 | 270.2667 |
| casual | Wednesday | 7AM | 0.5867328 | 429.0667 |
| casual | Wednesday | 2PM | 0.0000000 | 259.7667 |
| casual | Monday | 5PM | 0.4098025 | 241.6000 |
| casual | Saturday | 2AM | 0.4112389 | 280.5333 |
| member | Tuesday | 5PM | 0.9923324 | 244.8333 |
| member | Thursday | 8AM | 16.1193560 | 424.2167 |
| member | Tuesday | 5PM | 0.4122580 | 258.4167 |
| member | Tuesday | 8AM | 1.3748681 | 466.4667 |
| casual | Saturday | 5PM | 0.0000000 | 309.6500 |
| casual | Saturday | 2AM | 0.5352304 | 1006.7833 |
| casual | Sunday | 12AM | 9.5452403 | 265.9833 |
| casual | Friday | 7AM | 3.2035292 | 284.1333 |
| casual | Thursday | 11AM | 0.2280966 | 263.7500 |
| casual | Thursday | 11AM | 0.2280966 | 468.2667 |
| member | Monday | 8AM | 2.2277700 | 534.3667 |
| casual | Sunday | 12PM | 8.9292390 | 403.1333 |
| casual | Saturday | 11AM | 0.3008148 | 297.0833 |
| casual | Saturday | 4AM | 0.3298823 | 488.0833 |
| casual | Tuesday | 10AM | 3.2028703 | 325.8667 |
| casual | Wednesday | 1PM | 3.7338897 | 354.9000 |
| casual | Saturday | 6AM | 1.3602205 | 789.2333 |
| casual | Sunday | 2AM | 0.4434909 | 590.9833 |
| casual | Friday | 4AM | 2.5142259 | 304.8500 |
| casual | Saturday | 3AM | 1.3345599 | 537.4000 |
| casual | Saturday | 12AM | 0.9374323 | 743.3667 |
| casual | Monday | 12PM | 7.2737340 | 292.4500 |
| casual | Thursday | 4PM | 1.9895150 | 244.6333 |
| casual | Friday | 9AM | 0.0000000 | 339.2167 |
| casual | Wednesday | 3PM | 1.9895150 | 509.6500 |
| casual | Friday | 12PM | 5.6805280 | 390.3167 |
| casual | Saturday | 11AM | 15.7690985 | 367.7500 |
| casual | Sunday | 12PM | 1.8780596 | 410.9500 |
| casual | Monday | 10AM | 10.8390537 | 424.9333 |
| casual | Monday | 12PM | 0.0000000 | 241.5833 |
| casual | Sunday | 11AM | 1.8242200 | 445.0000 |
| member | Saturday | 12PM | 1.3805572 | 298.5000 |
| casual | Sunday | 1PM | 0.8372766 | 340.8333 |
| member | Friday | 11AM | 2.0960584 | 412.7000 |
| member | Thursday | 12PM | 1.4279265 | 426.0000 |
| casual | Friday | 5PM | 6.2873850 | 251.9833 |
| casual | Tuesday | 1PM | 0.0000000 | 599.9667 |
| casual | Saturday | 4PM | 0.0000000 | 245.0833 |
| casual | Thursday | 10AM | 0.0000000 | 271.2833 |
| casual | Thursday | 10AM | 0.0000000 | 270.7500 |
| member | Wednesday | 9AM | 0.7047275 | 364.9833 |
| casual | Friday | 11AM | 0.0000000 | 272.8167 |
| casual | Friday | 3AM | 0.9330142 | 247.2833 |
| casual | Saturday | 3AM | 3.3805878 | 392.7333 |
| member | Saturday | 4PM | 1.6871995 | 260.5000 |
| casual | Saturday | 12AM | 0.0000000 | 806.0833 |
| member | Sunday | 5AM | 4.4027706 | 370.5333 |
| member | Wednesday | 9AM | 0.0000000 | 330.4333 |
| member | Thursday | 12PM | 0.0000000 | 357.3333 |
| casual | Wednesday | 9AM | 1.9123130 | 262.6167 |
In the table above is a sample of the Rides over 4 hours in
duration, if you scroll you will see majority of the examples where
distance covered is not even a kilometer or worse 0. The explanation is
simple, riders are returning the bike from where they are picking them
up from. This is why I feel the data actually fails in the
comprehensiveness part now. As a junior analyst in this position, it
would be wise to alert my team of this. Getting their opinion on how I
should proceed, given the situation I will continue with my analysis and
leave out all results about distance from my analysis but bring it up in
my report.
This is somewhat of a rookie mistake on my part I
should have thought that through, but on the bright side I was able to
notice it during my process phase, investigate further & figure it
out.
| member_casual | day | time | distance | duration |
|---|---|---|---|---|
| member | Thursday | 10AM | 0.6416725 | 2.3666667 |
| member | Saturday | 2PM | 0.0000000 | 0.4833333 |
| member | Saturday | 2PM | 0.0000000 | 0.0500000 |
| member | Sunday | 6PM | 0.0072693 | 0.7833333 |
| member | Sunday | 12PM | 0.0084922 | 1.7333333 |
| casual | Tuesday | 3PM | 0.9837643 | 2.6500000 |
| member | Thursday | 4PM | 0.6314039 | 2.4500000 |
| member | Tuesday | 5PM | 0.0000000 | 1.0333333 |
| member | Saturday | 8AM | 0.0000000 | 2.7833333 |
| casual | Wednesday | 1PM | 0.3599641 | 2.3166667 |
| member | Sunday | 8AM | 0.5882149 | 2.0666667 |
| member | Wednesday | 8AM | 0.5969045 | 2.0500000 |
| member | Wednesday | 8AM | 0.8836357 | 2.9666667 |
| member | Thursday | 9AM | 0.4083844 | 2.0333333 |
| member | Saturday | 12AM | 0.0000000 | 1.8000000 |
| casual | Saturday | 10PM | 0.0130298 | 0.7500000 |
| casual | Saturday | 6PM | 0.0000000 | 0.1500000 |
| member | Sunday | 2PM | 0.0000000 | 1.0833333 |
| member | Thursday | 4PM | 0.0000000 | 0.4833333 |
| member | Wednesday | 4PM | 0.2734735 | 1.4833333 |
| member | Monday | 7PM | 0.6962600 | 2.6833333 |
| member | Friday | 3PM | 0.0000000 | 1.4500000 |
| casual | Sunday | 7PM | 0.0000000 | 0.4166667 |
| casual | Thursday | 11AM | 0.0000000 | 0.1833333 |
| casual | Sunday | 11PM | 0.0194948 | 0.3500000 |
| member | Tuesday | 1AM | 0.0150922 | 0.2166667 |
| member | Friday | 7AM | 0.7618581 | 2.9833333 |
| casual | Friday | 11PM | 0.0000000 | 1.3000000 |
| casual | Sunday | 7PM | 0.0018778 | 1.1333333 |
| casual | Friday | 4PM | 0.0024142 | 1.2000000 |
| casual | Saturday | 1AM | 0.0000000 | 1.5833333 |
| member | Wednesday | 2PM | 0.0078976 | 0.7333333 |
| member | Wednesday | 10AM | 0.0130662 | 0.9833333 |
| member | Wednesday | 11PM | 0.4251317 | 1.7333333 |
| member | Saturday | 1PM | 0.0057063 | 0.1000000 |
| member | Sunday | 6PM | 0.0103789 | 0.7666667 |
| member | Tuesday | 1PM | 0.0104043 | 1.0500000 |
| member | Tuesday | 8PM | 0.3794903 | 2.3666667 |
| member | Monday | 8AM | 0.4466237 | 2.4833333 |
| member | Tuesday | 4PM | 0.4495019 | 2.3833333 |
| member | Monday | 7PM | 0.4499959 | 2.0500000 |
| member | Friday | 9AM | 0.6718145 | 2.6500000 |
| member | Monday | 3PM | 0.5335743 | 1.7000000 |
| member | Wednesday | 3PM | 0.5375019 | 1.8666667 |
| member | Wednesday | 8AM | 0.4251317 | 1.8833333 |
| casual | Saturday | 10PM | 0.0000000 | 0.6500000 |
| casual | Wednesday | 9PM | 0.0000000 | 1.3500000 |
| member | Monday | 11PM | 0.0000000 | 0.2500000 |
| member | Wednesday | 4PM | 0.0092629 | 0.5166667 |
| member | Saturday | 1PM | 0.0000000 | 1.6500000 |
| member | Thursday | 7AM | 0.0106417 | 1.0166667 |
| member | Monday | 3PM | 0.0224739 | 0.8166667 |
| casual | Saturday | 2AM | 0.6157032 | 2.1833333 |
| member | Wednesday | 7PM | 0.2841614 | 2.0500000 |
| member | Monday | 7AM | 0.0065547 | 0.7333333 |
| casual | Tuesday | 8AM | 0.4599092 | 2.4833333 |
| member | Wednesday | 11AM | 0.4599092 | 2.1666667 |
| member | Thursday | 11PM | 0.0479219 | 2.2166667 |
| member | Friday | 11PM | 0.0000000 | 0.1000000 |
| member | Saturday | 11PM | 0.0079514 | 0.8500000 |
| casual | Wednesday | 12PM | 0.0000000 | 0.2666667 |
| member | Tuesday | 4PM | 0.0043538 | 1.9166667 |
| member | Monday | 8AM | 0.0000000 | 0.8666667 |
| member | Monday | 3PM | 0.0000000 | 0.8333333 |
| member | Saturday | 11PM | 0.3218281 | 1.6000000 |
| casual | Monday | 12PM | 0.0067973 | 1.2333333 |
| casual | Monday | 1PM | 0.0037224 | 0.4000000 |
| casual | Saturday | 5PM | 0.0415932 | 0.6666667 |
| casual | Saturday | 2PM | 0.0000000 | 0.7500000 |
| casual | Saturday | 2PM | 0.0000000 | 0.5000000 |
| casual | Friday | 9AM | 0.0000000 | 0.7333333 |
| member | Saturday | 8PM | 0.0127440 | 1.8166667 |
| member | Friday | 10PM | 0.3197353 | 1.9500000 |
| member | Monday | 8PM | 0.3197353 | 1.1833333 |
| casual | Friday | 5PM | 0.3197353 | 2.0333333 |
| member | Sunday | 9PM | 0.4705321 | 2.7333333 |
| member | Sunday | 3PM | 0.7752734 | 2.8833333 |
| member | Saturday | 1PM | 0.6203533 | 2.5333333 |
| member | Friday | 5PM | 0.3980277 | 2.4333333 |
| member | Monday | 3PM | 0.0002216 | 2.4833333 |
| casual | Friday | 8AM | 0.0000000 | 0.9666667 |
| casual | Friday | 10PM | 0.0000000 | 0.3666667 |
| member | Sunday | 10AM | 0.0000000 | 0.3000000 |
| member | Monday | 12PM | 0.2075367 | 2.1666667 |
| member | Monday | 7PM | 0.5569097 | 2.7333333 |
| member | Friday | 12PM | 0.7656174 | 2.3666667 |
| member | Monday | 7AM | 0.7616406 | 2.9166667 |
| member | Tuesday | 9AM | 0.3805234 | 2.5833333 |
| member | Thursday | 9AM | 0.3805234 | 2.4500000 |
| member | Friday | 12PM | 0.0000000 | 0.9000000 |
| member | Friday | 8AM | 0.0156308 | 0.6000000 |
| member | Sunday | 4PM | 0.0055419 | 1.0666667 |
| member | Friday | 7PM | 0.1003547 | 0.1500000 |
| member | Saturday | 12PM | 0.5810330 | 2.6833333 |
| member | Thursday | 2PM | 0.4886699 | 2.9000000 |
| member | Tuesday | 2PM | 0.5552736 | 2.4000000 |
| member | Friday | 1PM | 0.4825366 | 2.6000000 |
| member | Friday | 8PM | 0.2280966 | 2.9000000 |
| member | Friday | 9AM | 0.0000000 | 0.2666667 |
| member | Friday | 1AM | 0.0000000 | 0.0666667 |
| member | Sunday | 3PM | 0.0000000 | 0.5666667 |
| casual | Saturday | 4PM | 0.0000000 | 0.9333333 |
| casual | Saturday | 4PM | 0.0000000 | 1.6833333 |
| casual | Friday | 11PM | 0.0117703 | 0.6333333 |
| casual | Friday | 6PM | 0.0000000 | 0.4333333 |
| casual | Friday | 4PM | 0.0000000 | 0.8000000 |
| casual | Monday | 6PM | 0.0000000 | 0.5333333 |
| casual | Sunday | 1AM | 0.1100531 | 1.8000000 |
| member | Monday | 6AM | 0.8033813 | 2.8833333 |
| member | Saturday | 1PM | 0.8430592 | 2.9166667 |
| member | Thursday | 8AM | 0.4102942 | 2.1833333 |
| casual | Tuesday | 2PM | 0.3287639 | 2.0666667 |
| casual | Sunday | 7AM | 0.5378468 | 2.6333333 |
| member | Friday | 9AM | 0.4910749 | 2.1833333 |
| member | Friday | 3PM | 0.5638712 | 2.8666667 |
| member | Sunday | 11AM | 0.4665433 | 2.7500000 |
| member | Saturday | 2PM | 0.0058531 | 0.0166667 |
| member | Saturday | 2PM | 0.0052068 | 0.0166667 |
| member | Tuesday | 5AM | 0.0000000 | 1.1166667 |
| member | Friday | 6PM | 0.6061398 | 2.1833333 |
| member | Friday | 10PM | 0.5558077 | 2.5333333 |
| member | Wednesday | 3PM | 0.0000000 | 0.1000000 |
| member | Friday | 2PM | 0.0798099 | 1.6000000 |
| member | Sunday | 3PM | 0.7812582 | 2.0666667 |
| member | Sunday | 12AM | 0.0104379 | 0.5000000 |
| member | Thursday | 4PM | 0.0097392 | 1.0666667 |
| casual | Friday | 12AM | 0.0000000 | 0.0333333 |
| member | Thursday | 2PM | 0.6733444 | 2.9500000 |
| member | Sunday | 2PM | 0.6701441 | 2.7666667 |
| member | Friday | 7AM | 0.4018058 | 1.8166667 |
| member | Sunday | 6PM | 0.4018058 | 1.8500000 |
| member | Saturday | 7PM | 0.4018058 | 2.4166667 |
| member | Monday | 1PM | 0.2939542 | 1.7166667 |
| member | Wednesday | 8PM | 0.3151344 | 1.9333333 |
| casual | Wednesday | 11PM | 0.5521328 | 2.7500000 |
| member | Tuesday | 5PM | 0.5457753 | 1.8333333 |
| member | Tuesday | 8AM | 0.3597399 | 2.0833333 |
| member | Friday | 10PM | 0.5277459 | 2.2000000 |
| member | Tuesday | 2PM | 0.5242002 | 1.8166667 |
| member | Thursday | 7AM | 0.3433267 | 1.8166667 |
| member | Thursday | 5PM | 0.5247202 | 2.4833333 |
| casual | Saturday | 5PM | 0.9577617 | 2.8333333 |
| member | Tuesday | 10AM | 0.4937251 | 2.7166667 |
| member | Saturday | 8AM | 0.2146445 | 1.3166667 |
| member | Wednesday | 10AM | 0.0000000 | 1.6333333 |
| member | Friday | 11AM | 0.2575454 | 1.6833333 |
| member | Thursday | 2PM | 0.2918513 | 2.7166667 |
| member | Saturday | 4PM | 0.4685794 | 2.4166667 |
| member | Wednesday | 6PM | 0.3967977 | 2.2500000 |
| member | Thursday | 3PM | 0.8579785 | 2.9500000 |
| member | Friday | 7AM | 0.4406271 | 2.1666667 |
| member | Tuesday | 12PM | 0.4410127 | 2.0666667 |
| member | Thursday | 8PM | 0.4406271 | 2.7833333 |
| member | Thursday | 10PM | 0.0000000 | 0.0666667 |
| member | Sunday | 5PM | 0.0000000 | 1.3500000 |
| member | Sunday | 4PM | 0.0007547 | 1.1000000 |
| member | Monday | 1PM | 0.3792719 | 1.6500000 |
| member | Wednesday | 10AM | 0.4892355 | 2.3000000 |
| member | Friday | 1PM | 0.4892355 | 2.6666667 |
| member | Friday | 1PM | 0.4892355 | 1.7500000 |
| member | Thursday | 6AM | 0.4892355 | 1.9500000 |
| member | Friday | 2PM | 0.5395504 | 2.8666667 |
| member | Sunday | 12PM | 0.5395504 | 2.8166667 |
| member | Friday | 2PM | 0.5545163 | 2.1000000 |
| member | Tuesday | 9AM | 0.5234394 | 2.9833333 |
| member | Sunday | 8AM | 0.8429966 | 2.7166667 |
| member | Tuesday | 9AM | 0.0000000 | 0.0500000 |
| casual | Monday | 7PM | 0.0000000 | 0.1666667 |
| casual | Monday | 7PM | 0.0000000 | 0.0500000 |
| member | Sunday | 1PM | 0.0079737 | 0.3833333 |
| member | Sunday | 3PM | 0.3372566 | 2.2666667 |
| member | Wednesday | 9PM | 0.0132314 | 0.2333333 |
| casual | Friday | 8AM | 0.0059751 | 0.8166667 |
| member | Monday | 4PM | 0.0000000 | 0.3000000 |
| casual | Sunday | 6PM | 0.0135398 | 0.7166667 |
| casual | Friday | 4PM | 0.0103112 | 2.0666667 |
| casual | Saturday | 11AM | 0.0000000 | 2.0833333 |
| casual | Wednesday | 4PM | 0.0052275 | 2.2333333 |
| member | Friday | 2PM | 0.3516273 | 1.8666667 |
| member | Tuesday | 8PM | 0.3766763 | 1.7833333 |
| member | Friday | 3PM | 0.0000000 | 1.7833333 |
| member | Saturday | 5PM | 0.0000000 | 0.9500000 |
| member | Wednesday | 6PM | 0.5629344 | 2.5000000 |
| member | Saturday | 4PM | 0.5988432 | 2.5666667 |
| member | Monday | 6PM | 0.3835234 | 1.2666667 |
| member | Friday | 5PM | 0.0000000 | 0.3333333 |
| member | Saturday | 4PM | 0.3920508 | 2.6000000 |
| member | Sunday | 5PM | 0.3920508 | 2.2500000 |
| member | Thursday | 7AM | 0.5988432 | 2.3500000 |
| member | Tuesday | 5PM | 0.0000000 | 0.2333333 |
| member | Wednesday | 11PM | 0.0100569 | 1.2333333 |
| member | Wednesday | 5PM | 0.4748958 | 2.7833333 |
| member | Monday | 4PM | 0.3098704 | 2.5833333 |
| member | Wednesday | 2AM | 0.0000000 | 0.5500000 |
| member | Tuesday | 8AM | 0.0092253 | 0.6166667 |
| member | Wednesday | 9PM | 0.0000000 | 0.4833333 |
| member | Friday | 1PM | 0.4437310 | 2.3666667 |
| member | Saturday | 8PM | 0.0398147 | 0.7500000 |
| member | Tuesday | 8PM | 0.0022716 | 2.0500000 |
| casual | Saturday | 4PM | 0.0000000 | 1.2333333 |
At the 0 mark on the X-axis, the count doesn’t even start at 0. Most
trips under 1 minute are by casual users, suggesting they might be
trialing the service without fully using it. To better analyse
differences in rider type, I’ll exclude trips under 1 minute, which
total 54,115 values. However, this high level of curiosity will be
noted.
It is at this point I switch from
Cyclistic_data to Cyclistic Review, by
doing so I’m able to keep a file of the combined sheets that are cleaned
much like the files I created in the process phase and proceed to
Analyse phase with a different file.
Finally getting that out of the way I was able to build some tables
that will help me look at the data better.
Total counts and average trip duration
The combined files were at the beginning 5,699,639 observation. By the
time I get to this stage the new count is 4,091,950.
Which will be the total amount of rides I will be working with for my
analysis.
-2,650,101 rides were completed by members
with a 12 minute average ride duration
-1,441,849
rides were completed by casuals with a 23 minute
average ride duration
This COULD suggest that casual riders use
the service more for leisure or longer trips, while members might use it
more for commuting or shorter, frequent trips.
Top ten starting and ending locations
Both member and casual riders show a ton of activity around the Chicago Loop,
which happens to be the second largest business district in the United
States as I’ve recently discovered. This area also includes tourist
spots, lakefront access points, and residential areas.
For members, the top start locations make up 7.36%
of all member rides, and the top end locations account for
7.44% of rides respectively. This really highlights the
frequent commuting happening in the business districts and residential
areas. High traffic at main transport hubs just adds to this trend,
showing how biking is such a crucial part of daily commutes in
chicago!
On the flip side, casual riders are all about fun. The top start
locations for casual riders make up 14.56% of all
rides, and the top end locations account for 14.83% of
all rides. Tourist spots, scenic areas, and popular attractions are main
highlights of these locations. This could indicate a whole different
usage, focusing on leisure and recreational activities.
In addition
to this, while creating these tables i found out the distinct number of
stations is now 1,646 from the original
692! Despite being only 0.61% of the
total 1,646 stations, the top 10 locations account for a large portion
of the rides.
What bike
options are riders using more
Classic
Bikes: 67% of total rides
-Casual: 35%
-Member: 65%
Electric Bikes: 33% of total rides
Casual: 36%
Member: 64%
Between the two options, Classic bikes are
used more with very little difference between the member groups.
Ride density during the week
Firstly I
think its important to separate these two graphs so we can see the
difference between the weekday period of Monday - Friday & then
Saturday/Sunday
Out of a total of 4,091,950 trips, 2,886,597
occurred during the week, while 1,205,353 took place
over the weekend. Note that the graph sizes should not be interpreted as
proportional representations of the overall count; they solely reflect
the density of the individual counts in the separated graphs.
From
the density we can see a more corporate based usage during the week, and
recreational weekend usage. Further eluding that a lot of the riders
might be using the bike for work purposes.
What
does a yearly review look like?
Comparing members to
casuals the peak season for counts is generally the same split but a
HUGE drop coming into the winter. Casual riders experienced a
chilling 86% drop in rides from October 2023
(124,703 rides) to January 2024 (16,948 rides).
Member riders also
saw a decrease, though less harsh, with a 64%
reduction from September 2023 (265,546 rides) to January 2024 (93,769
rides).
Below shows density of both member & casual rides
throughout the year.
Something a little bit
extra
As someone who part takes in a community based
sport, I know for a fact that when other people are involved commitment
is alot easier. So after getting all these graph I did a quick google of
the community based events for cycling in Chicago and found a couple of
annual events, I picked two in September of 2023 to look at bike usage
averages for the specific time period compared to the rest of the month.
2023
Let’s Ride, Illnois! - With 9 out of the 17 rides that take place in
Chicago during the period 8th - 17th of September.
Apple Cider Century An
event on the last Sunday of September 24-09-2023.
With Let’s Ride, Illnois! the 8th-17th equating to around 33% of
the month, the percentage of the trips for that part of the month
actually falls short.
With Apple Cider Century it wasn’t the lowest
Sunday during the month, but usage doesn’t suggest a spike either.
While it is not definite it doesn’t seem like bike riding events in
Chicago actually impact the usage of Cyclistic bikes.
This is the part where I make company suggestions which is difficult for me, the issue I see with my suggestions will be the scope. As a junior analyst I would be asking my colleague’s if a suggestion is good or not, if they have already been trialed already or is the company at that stage yet when it comes to outreach. So please take these suggestions as more of the creative sparks that I have.
Even if you don’t see immediate response, if you see an increase in your casual rides it might indicate casual members are using the product more. Consider a time you’ve been at the grocery store and you buy the same single item product you buy every week, when right next to it you have the value pack and you think to yourself “I really should just buy the big pack”
While this is the end of the capstone, it doesn’t mean you wont see
more of me!
I’ve learnt alot throughout this course & it only
makes me excited to learn & share more.
Key skills I’ve
learnt from this course include:
-Understanding of
stakeholder needs, using evaluation questions to get data driven results
- Best practices when handling information: like proper
documentation, storing of files, importance of changelog’s, data
cleaning & aggregation.
-Using spreadsheets more effectively
and integrating them with other platforms
-Built solid foundation
knowledge of SQL, I will definitely need more exposure to it going
forward.
-Fundamental skills on Tableau for visualisation. Building
informative visualisations is not easy, much like SQL I will definitely
have to work on it.
-Fallen in love with R programming, looking at
blocks of code with a blank face to now reading it like its a difficult
book but gradually picking things up. I’ve effectively completed
majority of this capestone on R including this write up itself being an
Rmarkdown report.
Thank you for taking the time to look over
my first analysis, given the opportunity I would love feedback as I’m
very new to this all.
Til next time